Scientific communities across the world have shown great interest in the past to trace ancestral relationship among biological species and Indian scientists are actively showing interest in bioinformatics activities today.
Genome Valley in the vicinity of Hyderabad is India's first world-class biotech cluster for genomic research, training, and life science activities. It facilitates engineering of hybrids for genomics and proteomics through knowledge parks, SEZs, wet laboratories and incubation centers. The cluster synergizes healthy mix of academic institutes, agriculture research, biopharma, vaccine manufacturing, regulatory, testing and manufacturing activities.
Gene is a basic unit of heredity, which carries genetic information in form of DNA or RNA. The genetic constituent of all living organisms are same basic chemical compounds but their make-up and content is different for microbes and higher living forms. Genes contain inherited information of an organism that forms biologically important structures & functions.
Deciphering genetic codes of prescient biological species can reveal ocean of information about evolution and biodiversity. Genetic composition of plants and animals decide their behavior and structure therefore, classification parameter such as autotrophy and hetrotrophy.
The genetic content of bacteria (prokaryote, archae bacteria & eukaryotes) is used to study microbial diversity and analyze conversation of evolutionary mechanisms like biocatalysis and reproduction. The order of biological nomenclature, scientific classification of living organisms and system of ranking species is important for understanding basic principles of microbiology.
Taxonomy is the fundamental basis of applied biology that determines biodiversity and conservation of genes. As conservation becomes even more politically important, taxonomy impacts not only the scientific community but society at large.
Taxonomy is biological classification of organisms, which relate genus, species and family in form of phylogenetic trees. A phylogenetic tree is similar to family tree that illustrates our ancestral roots. Kingdom is the biggest domain for living organisms that show considerable degree of similarity in function, structure and mechanism.
A kingdom is more inclusive, large and broadly defined. It is divided into lower category (taxa), which is restricted by certain parameter pertaining to living organism such as motility (animals) and non-motile (plants). A phylogenetic tree accounts for number of classifying parameters like species, phylogeny (observed trait), divergence times, clades, and succession as calculated by modern algorithm.
A family tree is a phylogeny that depicts relationship of an individual (offspring) to its parents and relatives. Evolutionary relationships can be traced through phylogenetic trees starting from one particular organism with any classifying parameter. For example, Homo sapiens include family Hominidae, gene name as Homo and species as H. sapiens or H. Neanderthals.
Phylogentic taxonomy is the most recent form of biological nomenclature and classification that traces observable characteristics shared by certain organisms or observable characteristics derived from a common ancestor. Knowledge-representation systems used among phylogenic species are clustered into two categories i.e. predication and conditional.
Conditional knowledge-representation system uses method of evolutionary taxonomy to scan hierarchical relationships among organisms, for example, species characterized by a particular genus, family, order, class, phylum, kingdom or domain whereas predication observes laws of conditional phenetics for computational analysis to derive mathematical relationship of gene conservation among organisms, for example, gene encoding parental care among amphibians.
Linnaeus classification is the most traditional form of biological taxonomy that exists today. Swedish botanist Carolus Linnaeus, regarded as father of taxonomy, developed Linnaean classification for living organisms. He developed a conditional rank-based classification that included binomial nomenclature of organisms as opposed to many systematic biological clades (group) of his era.
Linnaean system traces particular organisms or particular species with gene, family, class and phyla. This system of classification opened new doors in tracing phenotypic and genotypic relationships among several species belonging to particular class or phylum. Apparent modifications to this taxonomy are brought with evolution of scientific techniques such as rDNA technology that allow biologists to predict or scan DNA-sequences and identify target genes for modification/ deletion/mutation.
Several new theories of evolution and ecology illustrate modern taxonomy and transgenesis. Evolutionary models prepared by scientists demonstrate slow and gradual change in primitive life forms that lead to development of higher life forms & inherited stable characteristics among particular class or phylum. Many models are prepared on grounds of theory of succession or pollination that explains gradual acquisition of genetic elements, which could utilize resources efficiently.
Genes that stabilized some biological traits among certain members of a particular class or family eventually survived over its counterpart. This fact places restriction on particular taxa and creates an order of ranking of living organisms that belong to a particular kingdom. For Example, Kingdom – Plantae is divided into two major Phylum – flowering plants and non-flowering plants. Each phyla is further divided in lower classes such as non-flowering plants into bryophytes, petridophytes, sphenophytes, lycopodophytes and gymnospermophytes; and, flowering plants into monocotyledons, dicotyledons, coniferopsida, cycadopsida, and gnetopsida.
Some biological traits offer advantage in terms of climate resistance, logic or enhanced immunity over particular disease depending upon their native state of occurrence or climate and geography. Microbial taxonomy forms basis of identification of unicellular and multicellular microorganisms. Many comprise of genetic material enclosed within the nucleus of a cell. Some important features of Bergey’s system of bacterial taxonomy include parameters like habitat, environment (air, water, soil), abiotic resistance of sunlight, temperature, water, methane, method of reproduction (sexual or asexual) and state of living (parasites or others), ability to utilize oxygen, autotrophy or heterotrophy.
Genetic and proteomic databanks are major bioinformatics resources that classify genes and proteins along with phylogenetic relationships among microorganisms, enzymes, and higher order taxonomy. Genes or proteins (enzymes, globins) are grouped based on functional similarity with other enzymes and/or other organisms that demonstrate homologous behavior. This genetic taxonomy enhances biological interpretation and visualization of shared or derived traits among bacteria, fungus and higher eukaryotes.
Highly complex mathematical and dynamic programming tools give functional classification to share genetic information contained in living species or organisms. A typical genetic databank can classify more than 75000 gene annotations from more than 14 annotation sources. Networking tools can allow virtual determination of coding and non-coding elements of a functional gene or protein. It further elaborates determination of coding products and virtual representation of organism.
Databases developed by different organizations allow screening of biological traits conserved in a particular region of world or endangered species of any country. Sequence and structure of such genes is available on leading bioinformatics resources like databank of Japan (DBJ), national centre for biotechnology information (NCBI), European bioinformatics institute (EBI) and Swiss institute of bioinformatics (SIB).
These databanks contain interactive programs and functional classification tools such as SOAP (simple object access protocol) and REST (representational state transfer) to provide rapid means to identify a broken fragment of gene or amino acid code of protein. These organize large lists of genes into functionally related groups to unravel biological content captured by high throughput technologies.
The use of this method in grouping related genes better reflects the nature of biology for association of a given gene with more than one functional group of genetic cluster. Dynamic programming tools and hidden Markov models classify large gene fragments into functionally related gene groups and rank them according to phylogenetic relations.
Protein databank (PDB) and expert protein analysis system (ExPaSy) are leading protein analysis system. Virtual representation of proteins and 3-D molecular graphics through programs like RasMol and Chime allow visualization of crystallographic data, tertiary structure and geometry of molecule.
International committee of genetic resource planning (ICGRP) developed guidelines for recommendation of genetic symbols and nomenclature in 1957. ICGRP recognized the need to develop guidelines on genetic codes and gene names during approval of human genome project. Similar guidelines cited other scientific research communities working on diverse genetic resources belonging to insect, microbial, plant or mammalian phyla. For many genes and their corresponding proteins, an assortment of alternate names is in use across the scientific literature and biological databases in public domain. Gene taxonomy poses a challenge to effective organization and exchange of biological information in the modern era.
Scientists familiar with a particular gene family are working together or experimenting to revise nomenclature on gene sets to create new information for modification in microorganisms like bacteria and yeast.
This process works by creating artificial cloning vectors (plasmids) for introducing a new gene (trans gene) in another organism to give desired feature such as antibiotic resistance, enhanced immunity or observed character. This process is termed as transgenesis.
Transgenesis is a modern concept of genetic engineering that allows artificial introduction of specific gene with advantage over wild-type species to impart a specific biologic trait. Transgene (foreign DNA) is introduced in the germ cell lines (sex cells) of crops and animals by some natural process to create a transgenic organism.
Transgenic organisms have benefit over wild type organisms such as abiotic resistance. It can be used to increase yield or productivity of crops and vegetables in agriculture practices. Transgenic microbes and animals are prepared for many reasons like: (i) Study genetic changes during development, (ii) Creation of new species with enhanced characteristics like resistance to diseases, (iii) Development of vaccines and immunogenic compounds or other biologically important compounds.
Transgenic plants are imparted special characteristics like resistance to pests, insects, herbicides for less weed production, resistance to draught conditions and increase in their nutritional value. Transgene may replicate with the genetic element of the organism or remain attached as extra chromosomal element that imparts desired character to cells when they are expressed. Genetic codes have been identified in some bacteria and yeast, which code for essential products within the cell that impart this important function.
Genetic nomenclature (see figure 1) includes genes for DNA replication, repair and recombination. It includes genes for transcription and translation, genes for signal transduction, genes for amino acid and protein metabolism, genes for nucleotide metabolism, genes for immune function, genes for ion transport and many more. Modern techniques of genetic and protein engineering in biotechnology are game changing strategies for relevant application in resource pool of the knowledge and intelligence for living organisms.
(The author is MD of VMG Biotech Consultants, New Delhi, a premier biotechnology consultancy and Contract Research Organization)